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Abstract. In this paper we study the distribution tails and the moments of "^^(A) and 
log'^(yl), where '^{A) is a condition number for the linear conic system Ax < 0, a; ^ 0, with 
A G M"^™. We consider the case where A is a Gaussian random matrix. For this input 
model we characterise the exact decay rates of the distribution tails, we improve the existing 
moment estimates, and we prove various limit theorems for the cases where either n or m 
and n tend to infinity. Our results are of complexity theoretic interest, because interior-point 
methods and relaxation methods for the solution of Ax < 0, a; 7^ have running times that 
are bounded in terms of log'^{A) and "^(A)^ respectively. 
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1 Introduction 



Let A E be given and consider the two systems"'^ 

Ax < 0, a; 7^ (1) 
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number CityU 1085/02P). 
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^Usually, in the literature, one considers a m X n matrix A appearing in (2), the "primal system", and its transpose A"^ 
appears in (1), the "dual system." We revert this notation here since in most of this paper we will deal with system (f) 
and we do not want to burden the notation with the transpose superscript. 
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and 



y = 0, y > 0, y ^ 0. 



(2) 



It is well-known that, if A is full row rank, one of these systems has a strict solution (one for which the 
satisfied inequality is strict in all coordinates) if and only if the other has no solutions at all. 

The following feasibility problem is a standard subproblem in linear programming: 

(FP) Given a n x m full row rank real matrix A, decide which of (1) or (2) is strictly feasible 
and return a strict solution for it. 

Iterative algorithms that solve this problem, such as variants of interior-point or ellipsoid methods, have 
a cost which depends on some measure of conditioning of the matrix A besides the natural dependence 
on n and m. For instance, a finite-precision algorithm solving the problem above is analysed in [14] to 
show a complexity of 



Here C{A) may be either the condition number introduced by Renegar in [32, 33, 34], which we denote 
by Cii{A), or a generalisation of Goflan's "inner measure" [20, 21] introduced in [10], which we denote by 
'-^{A) in the sequel. 

Another family of algorithms whose complexity can be analysed in terms of '^{A) are the many 
variants of the Agmon-Motzkin-Schonberg (AMS) relaxation method [2, 29] for solving systems of linear 
inequalities. This includes the cyclic projection method, the perceptron algorithm and certain types 
of subgradicnt algorithms. The complexity of those methods is typically proportional to ^^{A)"^. For 
example, for feasible systems the perceptron algorithm is guaranteed to find a solution to Ax < in 



iterations (sec Appendix B for further remarks). Although less interesting from a complexity perspective, 
such algorithms have appealing aspects in applications where m and n are both very large, for example 
in tumour radiation therapy planning. AMS relaxation is also the historic context in which the condition 
number '^{A) has first been studied, albeit only in the case of feasible systems, see [20, 21]. 

It is also worthwhile mentioning that there exist close links between the condition number "^^{A) and 
the notion of margin that plays a key role in the learning theory literature. Furthermore, the condition 
number '^{A) has applications in the backward error analysis and in estimating the stability of linear 
feasibility problems. For all these reasons, studying the moments of both log^(A) and "^{A) will be 
interesting in the probabilistic setting outlined below. 

Unlike m and n, the condition number of A is not immediate to determine from the data A and seems 
to require a computation which is not easier than solving the feasibility problem instance described by 
A (see [31] for a discussion). In addition, there are no bounds on its magnitude as a function of m and 
n. It may actually be infinite. Thus, bounds such as (3) and (4) tell us little about the running time we 
can expect for a given input A. 

A reasonable way to cope with this situation is to assume a probability measure on the space of 
n X m matrices A and to estimate the expected value of log^(j4) (or that of "^{A)). A standard choice 
of distribution is the Gaussian model. We say that a random matrix is Gaussian when its entries are 
i.i.d. random variables with standard normal distribution. The main result in [11] shows that if A is a 
Gaussian nx m matrix, then 




(3) 



O {'^{Af) 



(4) 





and 



E[\ogCn{A)]<E[\og'^{A)] + 



5 log n log m 



+ 2 log 2. 
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In most practical occurrences of the feasibility problem (FP) one deals with matrices for which n is 
some orders of magnitude larger than m. The case n ^ m {n much larger than m) is actually the case 
of interest among researchers in linear programming. In [16] the estimate (5) was refined to prove that, 
when n is moderately larger than m one has E[log'^(A)] < maxjlogm, loglogn} + 0{1). 

The main contributions of this paper are the following: 

(i) We further strengthen the existing bounds on E[log^(A)]: In Corollary 8 we show that if n > m 

E[log'r(yl)] < mlog2 

asymptotically as m ^ oo, and in Corollary 9 wc show that if n > 5m and 7 > is an arbitrary constant, 
then 

asymptotically as m — ^ 00. 

(ii) We generalise the bounds to arbitrary moments of log'^(A) and derive similar bounds for the 

moments of ^^(^A): see Corollaries 2,3 and 4. 

(iii) We derive various limit theorems that are of interest in large-scale problems: in the case where m 
is fixed and n — > 00, we show in particular that 

•r(A) ™p 1, 
lim E[(log'^(A))T] =0 V7 > 0, 

lim E[(^(A))T] = 1 V7 e [0,1), 

n — *oo 

see Corollaries 5,6 and 7. In the case where n > m and m ^ 00, it is again Corollaries 8 and 9 that 
bound the asymptotic growth rates of E[log'^(A)]. 

(iv) Previous probabilistic analyses of log 'i^'{A} relied on the fact that linear algebraic operations applied 
to Gaussian matrices (including vectors as a special case) lead again to Gaussian matrices. The analysis 
presented here is very different because it is based on geometry on high dimensional spheres. This 
approach makes it possible to derive not just upper bounds on the tail decay of log'^(A), but also lower 
bounds: Theorems 1 and 2 show that there exist functions c(m, n) > d{m, n) of m and n such that 

%^<P[^(A)>t]<^^ 

for all t large enough. This implies that the distribution tails of log (A) asymptotically decay exactly 
at the exponential rate e~*, see Corollary 1. More importantly, the geometric analysis we developed here 
generalises to almost arbitrary probability measures that are absolutely continuous with respect to the 
uniform measure on the sphere, as we will show in a follow-up paper. In the general case, the tail decay 
rates of log '^S' (A) are again exponential, but the exponent depends on a parameter defined as a function 
of the distribution. This exponential decay is perhaps the most important conclusion of our analysis, as it 
explains why the polyhedral feasibility problem - and linear programming by extension - is "empirically 
strongly polynomial" . We discuss the relevance of this notion in Section 2. 



2 Complexity Theoretic Context 
2.1 B ackgr ound 

The interest in the distribution of \og^(A) stems to a large extent from the conjecture that there exist 
so-called strongly polynomial algorithms for linear programming. Let us give a brief explanation for 
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the uninitiated reader: since the mid-1940s, variants of Dantzig's simplex method proved to be efficient 
algorithms for solving linear programming problems in practice, despite the fact that in the worst case 
these methods terminate only after a number of iterations that is exponential in the "size" , or the total 
input data length of the problem. As interest in complexity theory grew, many researchers believed that 
a good algorithm should terminate within a number of iterations that is bounded by a polynomial in the 
input size. Thus, the simplex method is not a polynomial algorithm. 

Surprising new approaches to linear programming subsequently proved to be polynomial time algo- 
rithms for linear programming under the Turing model: Khachiyan's ellipsoid method [25], Karmarkar's 
method [24] and the many interior-point methods developed since then are algorithms of this kind. These 
algorithms are guaranteed to terminate in polynomial time when the input data are rational and the input 
size is measured by the total bit-length of the data. 

On the other hand, under the so-called real complexity model one considers linear programming prob- 
lems whose input data are real numbers and imagines a hypothetical computer that can perform oper- 
ations on real numbers. In this model the complexity of an algorithm is the number of operations that 
are needed in the worst case to solve the problem. Such an algorithm is called strongly polynomial if 
its complexity is a polynomial function in the number of constraints and variables (the "dimension" ) of 
the underlying problem. Neither the ellipsoid method nor any of the known interior-point algorithms for 
linear programming is known to be strongly polynomial. In fact, their running time is theoretically un- 
bounded! This is in stark contrast to the simplex method which is guaranteed to terminate in exponential 
time. 

The situation is not hopeless, however, for the real complexity of ellipsoid and interior-point methods 
can be bounded by a polynomial in the problem dimension and the logarithm of a condition number, see 

the results cited in the introduction. Earlier relevant papers on this subject and on other applications of 
LP condition numbers include (among others) [32, 33, 34, 19, 18, 45, 14, 43, 44]. 

This condition-based complexity analysis is not new. In numerical linear algebra it occurs, for in- 
stance, in the analysis of the conjugate gradient method (cf. [30, 42]); in polynomial equation solving it 

occurs in the analysis of homotopy methods [36] or of grid-based methods [15, 13]. A recent survey for 
linear programming is [12]. A conceptually related idea in discrete mathematics is that of parameterized 
complexity [17]. 

The question of whether linear programming is strongly polynomial time solvable is considered an im- 
portant open problem and has many ramifications within complexity theory; in his list of 18 mathematical 
problems for the XXI century [38] Steve Smale includes this question as Problem 9. 

An interesting approach to get around the difficult issue of strong polynomiality of linear programming 
is the average case analysis of algorithms. The average case analysis reveals that linear programming is 
strongly polynomial time solvable on average, that is, there exist algorithms whose average running times 
arc polynomial under the real model when the input data are normally distributed. The simplex method 
was known to possess this property since the early 1980s [6, 7, 8, 37]. Similar work was continued in 
[1, 27, 39] and [9]. More recently, the attention has shifted to the average case analysis of interior-point 
methods [3, 40, 28, 22] and [23]. While all of these papers followed an analysis pertaining to particular 
algorithms, it is also possible to derive similar results by analysing the expected value of condition numbers 
under Gaussian (or other) input data. Combined with a condition-based complexity analysis this yields 
average case running time bounds for particular algorithms. This was the approach pursued in [41, 11] 
and [16], and it is also the approach we pursue in the present paper. 

We should point out that the relevance of average case analyses is subject to some justified scrutiny 
which we will further address and respond to in the next paragraph. 
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2.2 Discussion: Relevance of Our Results 



As mentioned in the synopsis of the introduction, the results we will derive in this paper include in 
particular polynomial bounds of E[log'^^(A)] in m and n. In the literature on the probabilistic analysis 
of linear programming such results are considered important because they show that LP is "strongly 
polynomial on average" . Precisely how significant are such statements from a complexity theoretic view 
point? 

On the one hand, the average behaviour of an algorithm on random input data yields in itself an 
interesting complexity measure which, as many would argue, can be more relevant than the study of the 
worst case scenario. The weakness of this argument is that the relevance of average case results depends 
on how well the assumed probabilistic model describes the distribution of the input data that one might 
observe in particular applications. Without doubt, uniform or Gaussian matrices are an inadequate model 
in most but a few applications. In a follow-up paper we will therefore show how the techniques developed 
here extend to matrices with much more general probability distributions. We chose to treat the Gaussian 
case separately because it can be presented in a non-measure-theoretic setting which is accessible to a 
wider audience, and because it allows to directly compare our results, which were obtained by arguments 
based on spherical geometry, with the results obtained in earlier papers via the very different techniques 
of linear algebra on Gaussian matrices. 

On the other hand, it is sometimes argued that understanding the average behaviour of interior- 
point algorithms constitutes a step towards proving strong polynomiality of linear programming. This 
argument is somewhat weaker than the first, because it may of course be that linear programming is 
"strongly polynomial on average" for a wide range of input distributions whilst not allowing a strongly 
polynomial time algorithm. Thus, the two phenomena might be unrelated. 

In our view the most relevant link between the results presented in this paper and the conjectured strong 
polynomiality of linear programming (and the closely associated linear feasibility problem treated here) 
consists not in boimds on the moments of log (A) but in the exponential decay of its distribution tails 
observed in Corollary 1. This fact explains why algorithmic experiments have a tendency to strengthen 
the intuition that the conjecture of strong polynomiality be true: 0{e*) simulations are needed to observe 
an event in which log'^{A) > t (and even in that case it is not guaranteed that an algorithm is necessarily 
slow at solving the problem). Thus, it is impossible to observe the really bad cases in random experiments. 
In contrast, in algorithms whose complexity depends polynomially on '^{A), much fewer simulations 
reveal cases with extreme running times, because it takes 0{t) experiments to detect an event in which 
'^^{A) > t. Of course, this argument is again subject to the criticism that the exponential decay of 
P['^(A) > t] might be a particularity of the chosen input distribution for the data of A. However, our 
analysis of more general distributions shows that exponential decay rates are a common feature of a very 
general family of distributions. 

These observations suggest a notion of "empirical strong polynomiality" ; if linear programming is not 
strongly polynomial time solvable, then this fact cannot be observed in random experiments because the 
relevant events are exponentially rare. This is in our view the correct interpretation of our results and 
the main message of this paper. 

3 Basic Notions and Notation 

If there exists x e R™, x ^ 0, such that Ax < we say that A is feasible. Otherwise, we say that A is 
infeasible. Also, if there exists a vector x such that Ax < componentwise, then we say that A is strictly 
feasible. If A is feasible but not strictly feasible then we say that A is ill-posed. 

In the sequel we consider arccos(t) as a function from [—1, 1] into [0, tt]. In this region, both cos and 

arccos are decreasing functions. 

Let tti be the ith row of A. Denote by 9i{A,x) the angle between and x, that is, arccos( ||J^j||f^|| )■ 



5 



Let 6{A, x) = mini<i<„ 9i{A, x) and x be any vector in \ {0}, s.t. 0{A) = 6{A, x) = sup^^^m 6{A, x). 
The condition number 'rf{A) is defined as 

nA) = \cos{9{A))\-\ (6) 

In the case where the system Ax < is feasible, '^S'{A) is the same as GofHn's condition number u 
[20, 21], which he developed to analyse step length rules that guarantee finite convergence of the relaxation 
method applied to feasible systems of linear inequalities. '^{A) is also closely related to Agmon's condition 
number A [2] with which it coincides in some cases, but again A is only defined for feasible systems. Thus, 
'^{A) is more general because it is defined both for feasible and infeasible systems. 

It is not difficult to see that A is strictly feasible iff 9{A) > tt/2, ill-posed iff 0{A) = 7r/2 and infeasible 
iff 0{A) < tt/2. It is also easy to show, using a compactness argument, that a vector x such as used in 
the definition of "^(A) exists. Note that since '^{A) is defined purely in terms of angles between vectors, 

is invariant under positive rescaling of the rows of A. Hence, we may assume without loss of generality 
that all rows of A have been rescaled to unit length. Our analysis then reduces to geometry on the unit 
sphere. 

Let S"*"^ denote the unit sphere in R"*. For p e S"*"^ and p G [0,7r] we denote by cap(p, p) the 
circular cap with centre in p and angular radius p, that is, 

cap(p, p) = {x G S"*~^ : X ■ p> cos p} . 

By dc&p{p, p) and Int (cap(p, p)) we denote the boundary and interior of cap(p, p), respectively in the 
standard topology on S, that is, 

9cap(p, p) = {a; G S™"^ : x ■ p = cosp} , 
Int(cap(j3, p)) = {x G S™"^ : x-p> cosp}. 



The following result provides geometric insight about the condition number '^{A). 
Proposition 1 It is true that < 9{A) < p <'k it and only jfUr=i <^^P('^i>P) = S™~^. 
Proof. 

9(A)<p ^ max 9(A,x) < p 

2:GIR™\{0} 

<^ Vx e S'"-\ min 9i{A,x) < p 

l<i<n 



Va; e S" e {1,2, . . .,n},a; e cap(ai,p) 

n 



^ y cap(ai,p) = S'"-^ 



4 Characterising Extremal Circular Caps 

In this section A is a real nx m matrix with unit row vectors a^. A largest circular cap excluding rows of 
A (in short, a LCP of A) is a cap cap(p*, p*) corresponding to a maximiser p* and its objective function 
value p* = p{p*) for the optimisation problem 

max p(p), (7) 
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where p{p) = mini arccos(p ■ ai). A smallest circular cap containing all of the aj {i = 1, . . . ,n) (in short, 

a SCP of A) is a complement of a LCP of A. 

The following result explains why the notions of LCP and SCP are important in the analysis of the 
condition number 'j^{A). 

Proposition 2 Let cap(p, p) be a LCP of A. Then x* = p maximises the function 9{A, x) and it is true 
that e{A) = e{A,p) = p, that is, "^{A) = 1/| cosp|. 

Proof. LetxeS^-^be a maximiser of Oi^A, x). Then 0(^1, x) = Oi^A) and Int (cap(a;, 0(^4.))) contains 
none of the rows of A. This shows that 9{A) < p. On the other hand, since Oj € cap(— p, n — p) for all i, 
we have 

p < min arccos(j3 • Oj) = 0{A,p) < 0{A). 

l<i<n 

□ 



It is worth investigating the properties of LCPs and SCPs a bit further. A simple compactness 
argument shows that LCPs and SCPs exist for any A. If A is infeasible then these caps are not unique in 
general. To visualise this fact, it is helpful to consider the limiting case where a countable set of points 
densely fills what is left of the unit sphere after two circular caps of equal radius but different midpoints 
have been removed. 

If ^ is strictly feasible however, there exists a unique LCP, respectively SCP, as the following convexity 
argument shows: it is easy to see that the strict feasibility of A implies that p* ■ at > for all i and 
< ^* < TT for any SCP cap(p*, g*). It suffices therefore to argue that the minimisation problem 

min g(p) 

s.t. p- ai > {i = 1, . . . ,n) 

has a unique local minimiscr, where q{p) = tt — p{—p) = max^ arccos(p • ai). Suppose pi ^ P2 are two 
distinct local minimisers of (8) such that Q2 '■= q{p2) > Qi '■= q{Pi)- Then 

Pj ■ ai> cos Qj > Q (i = 1, ... ,n;j = 1,2). (9) 

For A G (0, 1) let p(A) := Xpi + (1 - \)p2 and p{\) := p{X)/\\p{X)\\. Then pi ^ p2 implies that ||p(A)|| < 1 
and 

p{X) ■ ai > p{X) ■ ai > \ cos Qi + {1 — X) cos Q2 (« = 1, . . . ,n). (10) 

It follows from (10) that p2 is not a local minimiser of (8) if Q2 > that is, it must be true that 
Qi = Q2- But then, for all A G (0, 1), (10) shows that cap(p(A), ^(A)) contains all (i = 1, . . . , n), where 
p(A) = arccos(minjf?(A) • ai) < gi, contradicting the local optimality of pi and p2- This shows that the 
SCP and, by extension, the LCP of A are unique, as claimed. 

Next we investigate the idea of blocking sets. Let cap(p, p) be a LCP of A. We say that 

S = {i: aie 9cap(p, p)} (11) 

is the blocking set of cap(p, p). The blocking set corresponds to the vectors that locally keep the LCP 
from having a larger radius.^ In fact, S is the active set in the following equivalent reformulation of (7): 

max p 

pes'"-! 

s.t. p ■ ai < cos p (i = l,...,n). 



^Wc say "locally" in the sense that the largest cap not containing any of the ai and centred at a point q has radius 
smaller than p for all g in a small neighbourhood around p. Note, however, that the blocking set does not prevent a point 
q far away from p from being the centre of an even larger cap not containing any of the a^. This idea of a blocking set 
thus explains why cap(p,p) is a local minimiser of problem (8). Of course, a LCP is defined as a global minimiser of this 
problem. Thus, the existence of a blocking set is a local optimality condition. As always in nonlinear optimisation, useful 
global optimisation criteria don't really exist. 
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By straightforward extension we also speak of the blocking set of a SCP. 

If A is strictly feasible then the blocking set of the (unique) LCP can have any cardinality > min(2, n), 
as can easily be illustrated by the example of three points on that lie on a single grand circle and 
within a sufficiently small angle of one another. 

The situation is rather different if A is infeasible and n > m. In this case all blocking sets have 
cardinality > rn. In fact, let cap(p, p) be a LCP of an infeasible A, that is, 0{A) = p < 7r/2, let S be the 
blocking set of cap(p, p) and suppose that < m. Then there exists a vector u e S*""-^ n Span (S')-'- 
such that u- p> 0. Let ps — p + Su. For 6 > we have \\ps\\^ — 1 + S'^ + 2Su ■ p > 1. Let ps = ps/\\ps\\ 
and ps = arccos(cos p/llpall) . Then ps > p. For all i ^ S' and (5 > sufficiently small, Ui ^ cajp{ps, ps), 
since Ui ^ cap(p, p) and ca.p{ps,ps) varies continuously as a function of 5. Moreover, for i e 5 we have 
PS ■ tti = p ■ tti + 6u ■ tti = p ■ ai = cos p. Therefore ps ■ Ui = cos(p/||p5||) = cos pa, and this shows that 
Oj G dcsLp{ps, ps) for all i & S. We conclude that Int {ca.-p{ps , ps)) contains none of the at, and since 
Ps > p = 0{A) this is a contradiction. Therefore, our assumption was wrong and l^] > m. 

Let us summarise what we have found so far. 
Proposition 3 Let A e R"^"* iiave unit rows. Then 

(i) If A is strictly feasible there exists a unique LCP and SCP of A but the cardinality of the blocking 
set is arbitrary. 

(a) If A is infeasible then there exist LCPs and SCPs of A which are not unique in general, but their 
blocking sets always have cardinality > m. 

Proof. See arguments above. □ 



Let us further explore the link between blocking sets and extremal circular caps. We consider the set 
of index sets of cardinality m, Vm = {5* C {1, . . . , n} : l^l = m}. If S* G Vm and A G IR"^™ we denote by 
As the mxm matrix obtained by removing all rows from A with index not in S. Let e = (1 . . . 1)"^ G R™. 
If A is nonsingular, the vectors 

Us = Aa^e and us = 77—77 (12) 

are well defined. 

Lemma 1 Let S G Vm be such that As is nonsingular, and let p G S™~^, p G [0, n/2) and S G Vm be 
such that Qi G 9cap(p,p) for {i G S). Then p = us and cosp = ||us||~^. 

Proof. If G 9cap(p, p) then Ui ■ p = cosp. Therefore, Asp — (cosp)e and p — {cos p)Ag^e = 
{cosp)us. Since ||p|| = 1, we have |cosp| = ||ws||~^ and, using p < 7r/2, cosp = ||u5||~^. We conclude 
that p = Us. □ 



Blocking sets are the key tool that allow us to gain information about the distribution of '^^{A) when 
the unit rows of A are random vectors with known distribution. The fact that the blocking set S may have 
cardinality \S\ < m for strictly feasible A is an obstacle to this analysis that requires further attention. 
The following two lemmas allow to overcome this problem in the analysis of upper and lower bounds on 
the tails of '^{A) respectively. 

Lemma 2 Let A G M"^™ liave n> m unit rows, and let cap(p, p) with p < 7r/2 contain all rows of A. 
Then there exist p' G S™~^ and p < p' < \ such that cap(p', p') aiso contains all rows of A and at least 
m of them lie on the boundary 5cap(p', p'). 
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Proof. Let 5 = {i : a; G 9cap(p, r)}, and let us assume that \S\ < m. Then there exists a vector 
u e S™~^ n Span (5*)-^, where we set Span (5)-'- = E,™ if 5* = 0, such that there exists an index i ^ S 
with Gi ■ u ^ 0. Without loss of generahty we may assume that p ■ u > 0. Let i* be a minimiser of 



mm 



cos p—aj-p 



and 

cos p - 



S = 



Gi* ■ U 



Let finally ps = P + Su. Then p ■ u > implies that \\p5\\ > 1 and hence, ps = \\ps\\~^Ps and ps = 
arccos(||p5||^^ cosp) e [p, |) are well defined. The proof of Proposition 3 shows that Ui G dca.-p{ps, ps) 
for all i £ S. Moreover, the definition of S shows that a^* • ps = cosp and hence that Oj* • ps = cos ps- 
This shows that ai» G dca.p{ps,ps) and thus, > where 

S' = {i : ai € dcaLp{ps, ps)}- (13) 

Note also that if ^ cap(p5,p5) then i ^ S and ai-ps < cos ps, that is, ai ■ ps < cos p which contradicts 
the choice of i* . Therefore, ca.-p{ps., ps) contains all rows of A. Using this construction recursively, we 
eventually arrive at the desired p' and p' . □ 



Lemma 3 Let ai, . . . , be linearly independent elements of 9cap(p, p) C S™~^ with p G [0, tt/2). Let 
TTp : Span (p)-^ be the orthogonal projection along p. Then ca.p{p, p) is the SCP of A = [ai . . . Um]'^ 

if and only if € conv(7rpai, . . . , iTpam)- 

Proof. Let us first remark that is trivial to see that p < 7r/2 implies that 

e conv(7rpai, . . . , TVpttm) <^ p & cone(ai, . . . , am)- (14) 

Note also that the linear independence of the implies that A is strictly feasible and the SCP of A is 
unique. 

We will now show the "only if" part of the lemma. Suppose ^ conv(7rpai, • • • , TTpOm). Because of 
(14), Farkas' lemma implies that there exists q € S™"^ such that q ■ Ui > Q {i = I, - . - ,m) and q - p < 0- 
For 5 > let 

Ps =p + 6q, Ps = J. — TT and ps = arccosmin(aj • ps)- (15) 

Then \\ps\\-^ = l-6p-q + 0{d^) and 

ai-ps> (cos/9 + 5ai ■ q){l — 5p - q + 0{6^)) > cos p (i = 1, . . . , m) 

for all (5 > small enough. Therefore, ca.p{ps,ps) contains all Oj and < p for < (5 -C 1, showing that 
cap(p, p) is not the SCP of A. 

It remains to show the "if" part of the lemma. Because of (14) we may assume that p = X)™ i Xitti 
for some Ai, . . . , > 0. Let us assume that cap(p, p) is not the SCP of A- Then by a construction 

similar to (10) there exists a direction d E Span (p)-^ such that ca.p{ps, ps) contains all and ps < p for 
< (5 <C 1, where ps and ps are defined as in (15). That is to say, 

^ ai ■ p + dui ■ d 
ai-ps = , ^„ — > cos p = ai-p 

V 1 + 

for (z = 1, . . . , m) and < (5 <C 1, which shows that -d > for all i. But then = p-d = X^Ili ^iO-i'd > 0, 
which shows that our assumption was wrong and cap(p, p) is indeed the SCP of A. □ 
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5 The Input Distribution 



It is well known that if A is a Gaussian matrix with rows Oi , . . . , a„ then 

Oil dn II ||2 II ||2 

oil ) • • • ) W^nW 



||ai|| \\0-n 

are independent random vectors and variables, the first n of which have uniform distribution on the 
sphere, ai/||aj|| ^ ^(S™""'^), and the last n of which arc Xm distributed on ]R_(_. Recall that 'to{A) is 
invariant under row scaling of A. The distribution of ^{A) under Gaussian input is therefore the same 
as under input matrices with i.i.d. ^ (S™"^) rows. 

In subsequent sections we will therefore assume that 

Ai-.n^S'^-^ (i = l,...,n) 

arc i.i.d. random vectors defined on a probability space (O,^, P) such that A^ ^(S™~-^), and that A 
is the random n x m matrix 

A = [A,...A„r. 

We say that A is a uniform random matrix. For convenience we shall assume that m < n, though the 
case m < n is not difficult to derive from the results we will develop. 

We adhere to the usual practice in probability theory and say that a property holds almost surely (or 
for almost all w G fi) if it holds with probability 1. An event that occurs with probability zero is also 
called a null-set. 



Remark 1 Using Proposition 3 (ii) and Lemma 2 it is trivial to show that if A is a uniform random 
matrix then almost surely A is not ill-posed. Likewise, the sets S from (11) and S' from (13) are bases 
of M™ almost surely. Finally, any m i.i.d. uniformly drawn vectors from S™^^ arc a basis of almost 
surely. Therefore, when A is a uniform random matrix then for all S G Vm the matrix As is almost 
surely nonsingular and the random vectors us and us are defined for almost all u Gfl. 



Wc denote the uniform probability measure on S™""'^ by Vn-i- It is well known that the m — 1- 
dimensional volume of ca.p{p, p) equals the integral 1^-2 (p) = /o^sin"'~^x dx times the volume of the 
unit sphere in H"*"^. Therefore, 

P [Ai e cap(p,p)] = i.„_i(cap(p,p)) = h^zl^El (16) 

.(771-2 (TT) 

It is trivial to check by induction that 

/.(.) > (17) 



6 Upper Tail Bounds 



The goal of this section is to derive an upper bound on the tail probability P[^(A) > t] when A is a 
uniform random matrix. 



10 



Let A he a. random uniform matrix. For S € Vm and t > 1 we consider the following events: 

{w e O : ||us(w)|| > t] , 

{w e O : Ai{Lo) ^ cap(us(a;),arccos(l/f))Vi ^ S} , 
{w G O : Ai{u)) e cap(us(a;),7r/2)Vz ^ 5}, 
{w G : (^(w) infeasiblc ) A ("^(^(w)) > t)} , 
{w G O : {A{u) strictly feasible ) A {'t^{A{u})) >t)}, 

The following lemma is a key tool in our analysis. 

Lemma 4 Let A be a uniform random matrix and t G [l,C!o) 
{User^'^t,7,/2{S)) are null-sets. 

Proof. If ^ = A{uj) is infeasible and "^{A) = 1/ cos 9{A) > t then 

arccos(l/t) < e{A) < 7r/2. (18) 

Let cap(p, 6{A)^ be a LCP of A and 5 the corresponding blocking set. By Proposition 3 and Remark 1, 
we have \S\ = m for almost all oj E fl. Lemma 1 implies that p{A) = us for these w and arccos(l/||u5||) = 
0{A), which together with (18) shows that ||us|| > t and 

cap (us, arccos(l/f)) C cap (us, arccos(l/||u5||)) 

does not contain any of the Ai [i ^ S). This shows that Af \ (Use-p^ ^t{S)) is a null-set. 

U A = A{(j) is strictly feasible and '^{A) = l/|cos6i(A)| > t, then 7r/2 < 6»(A) < arccos(-l/t) and 
hence, 

^ > TT - 0{A) > arccos(l/t) (19) 

Let cap(— p(a;), TT — 0(A)) be the (unique) SCP of A. Lemma 2 applied to this cap shows that there 
exists p' G [n — 9(A), n/2) and p' G S™^^ such that cap(p',p') contains all rows of A. By Remark 1 
S = {i : Ai(u) G 9cap(p', p')} is of cardinality m for almost all u> G V,. Lemma 1 implies that p' = Us 
and p' = arccos(l/||us||) for these uj. Moreover, cap(7is, 7r/2) D cap(Ms,p') contains all rows of A and in 
particular {Ai(ui) : i ^ S}. This shows that Af \ (Usep„ St,-K/2(S)) is a null-set. □ 



^ft{s) = 
Br\s) = 

Af = 

St(S) = 

St,v/2{S) = 



Thenj^'\{[jseV^St(S)) and Af \ 



For shorthand notation we write 5* = {1, . . . , m} G Vm, = St(S*) and S*^^^ = <St_^/2(5'*) in the 
sequel. Then, for f > 1, 



P [^(A) >t\ = P [A^] + P [Af] + P [{A ill-posed) A (^(A) > t)] 



= P 



seVn 



^fn y St,^/2(S) 



S^Vrr 



< ^ F[MS)]+ ^ P[St,^/2(S)] 
where (20) follows from Remark 1 and Lemma 4. 



< 



n 
m 



(20) 



(21) 
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Note that 



s™-i 



Sr*(^*) Aft{S*),us' = X vm-i{dx) P [Nt{S 



= I 



Im-2 (arccos(l/t)) 



(1 - um-i (cap {x, arccos(lA))))"-'" u^-iidx) ] ■ P 



(22) 
(23) 
(24) 



where (22) holds because ^(ug. 11^(5**)) ~ '^(S™-^), (23) holds because the random vectors Ai (i ^ S*) 
are independent of Aj [j E S*), and (24) follows from (16). Hero, for a random variable X, S^{X) 
denotes the distribution of X, and ^{S"^~^) denotes the uniform distribution on S™"^. Likewise, a 
similar argument shows that 



'i,7r/2 



= 2-("-™)p[M(S'*)]. 



(25) 



The task remains to determine bounds on P [Aft{S*)]. For j € S* let Bj be the unique unit vector in 
Span {{Ai : i ^ j})""" that complements {Ai : z ^ j} to a positively oriented basis of M™. Note that Aj 
and Bj are independent random vectors. 

Lemma 5 Let Cj := |w G : Ujes* {l-Sj(w) • ^j(w)| < m/t}|. Then Ct \ ^(5"*) is a nuJiset for aJJ 
i > 1. 



Proof. Almost surely As* is nonsingular and then 

< ll^s- II • l|e|| < WAsIWf^. (26) 

Together with (26) the inequality ||ws-|| > t implies that HAgJUp > tjs/m. Whenever this holds there 
exists an index j e 5* such that H^"/ 1| > t/m, where AZ^ denotes the j-th column of ^5.^- Now the 
equation ^5* A^l = / implies that Ai ■ A~j = Sij (the Kronecker symbol), which shows that A~j /\\A~J = 
±Bj and \Bj ■ Aj \ = < m/t. This proves the result. □ 



Lemma 6 For all m > 3, u € S"^ and {i = 1, . . . ,n) it is true that 



I A- • u\ < 



m 



< 



3 

m2 



Proof. Since the statement is trivially true when m > t, we may assume without loss of generality 
that m <t and that arccos(m/t) is well defined. Therefore, 

\Ai ■ u\ < m/t ^ Ai G cap {u, arccos(— m/t)) fl cap {—u, arccos(— m/t)) . 

Thus, 

P [\Ai ■ u\ < m/t] = {Im-2 (arccos(-m/t)) - Im-2 (arccos(m/t))) {Im-2{'^)y^ 

- siiT-'^ X dx - t 

where the last inequality follows from the fact that m > 3 and from equation (17). □ 
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Lemma 6 allows us to compute the bound we seek as follows: for i > 1 and m > 3 we have 

Lem 5 

p[A/;(5*)] < p[Ct]+o 

P[\B,-A,\<- 

jes- 



T 



S" 



5 

< —■ (28) 



where (27) uses the fact that Bj is uniformly distributed on the sphere because the Ai {i ^ j) are, and 
where (28) follows from Lemma 6 and the fact that Aj is independent of Bj. Putting all the pieces 
together, we can now give an upper bound on the tail decay of 'rf{A). 

Theorem 1 For all t > 1, m > 3 and n> m it is true that 

n\ ^ 5 /_ /m_2 (arccos (i)) \ 1 



P [<^(A) > il < • 2m* • 1 - , , , , 

Proof. The claim follows immediately from equations (21), (24), (25) and (28) together with the fact 
that 1 - 7„_2(arccos(l/t))//„_2(7r) > 1/2. □ 



7 Lower Tail Bounds 

The goal of this section is to derive lower bounds on the decay rates of ^(A) in Theorem 2. In Section 8 
we will see that the combination of Theorems 1 and 2 yields the exact asymptotic decay rates of log ^{A). 

Since K fl S™~^ is a Borel set for all convex cones K C H™, the angle space of K 

as(7^) =i/™_i(if nS™-!) 

is well-defined. It follows from the remarks of Section 5 that alternative equivalent definitions are provided 
by the relations 

as{K) = 'P[X&Kr\ S™-^] = P [y e /T] , 

where X ^ '^(S™^^) is a uniform random vector on the unit sphere and K is a multivariate normal 
random vector on H™ with covariance matrix cr^I for any > 0. 

Let Ai ^ (S™^^) {i — 1, . . . ,k) be i.i.d. random vectors, where k < m, and let £l{Ai, . . . ,Ak) C fl 
be the event that {Ai{uj), . . . , Ak{uj)} is a linearly independent set of vectors. Then fl \ Cl{Ai, . . . ,Ak) 
is a nuUset, and for all uj G CT{Ai, . . . , Ak) there exists a unique orthogonal basis {Ei,...,Ek} of 
Span (Ai, . . . , Ak^ such that Ei- Ai > and Span (^Ei, . . . , Ei) = Span [Ai, . . . , Ai) for (i = 1, . . . , k). In 
fact, the vectors Ei are the column vectors of Q in the thin QR factorisation of the matrix [^i . . . Ak] 
(i.e., the Ei are obtained by Gram-Schmidt orthogonalisation of the Ai). Let us consider the event 

Cm,k = e CI{Ai, ...,Ak): cone(Ai, ...,Ak)^ cone(£i, ...,Ek)}. (29) 

If A; = m we write Cm instead of Cm,m- Note that in this case, as{cone{Ei, . . . ,Em)) = 2~™. The 
following lemma shows thus that the angle space of the cone generated by the Ai is not too small with a 
quantifiable probability. 
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Lemma 7 Let Ai, . . . , Am be i.i.d. random vectors with Ai ^ (S™ Then 

P[C™] >2-'^"'2"'-'', (m>l). (30) 

Proof. Wc proceed by induction over m. For m = 1 we have P [cone(Ai) D cone(£'i)] = 1 > 2~-^, 
which shows that (30) holds true in the base case. 

Suppose (30) holds true for m — 1 and let us show that it holds for m. For almost all a; € 

A„(a;) ^Span(i;i(w),...,i;„_i(a;)). (31) 
Let us thus assume that (31) holds and let 

7rs„ : Span (-E'i(w), . . . , Em{uj)) Span (£^i(w), . . . , Era-i{u))) 
denote the orthogonal projection along Em{i^)- Let 

I lFB„(-^m)|| J 

We claim that 

C {Em&cone{E^,...,Em-i,Am)) . (32) 

In fact, 

J^eA ^m) g cone(£;i,...,£'m_i) <^7r£;„(-A„) e cone (.Bi, . . . , iJ^.i) , 

except on a nuUset, and 7r£;^(— >!„) e cone {Ei, . . . , Em-i) implies that there exist /Xj > (z = 1, . . . , m — 
1) and Urn > such that 

711— 1 

= l^mEm + TTE^Am = ^rnErri — /Ui^^j. 

i=l 

Hence, E^, = /U~^(A„ + fJ-iEi), which proves (32). Clearly (32) implies that 

(33) 

Let Gm-i,m be the Grassmannian of 1-codimcnsional linear subspaccs of M™. Gm-i,m is a compact 
manifold with a transitive group action defined by the orthogonal group Om- The Gm-i.Tri-valued random 
variable 

G:uii-^ Span(Ai(a;),...,A„_i(a;)) e G^-i,™ (34) 

has uniform distribution i^G„,_i„; that is, i^G„,_i „ is the unique probability measure on Gm-i,m that 
is invariant under the group action of Om- In fact, this follows trivially from the spatial symmetry of 
the joint distribution of the (i = 1, . . . , m — 1). It follows likewise from this symmetry that for all 
g e Gm-i,m the random vectors 

{-Am) 



r, Al, . . . ,A, 



m—1 



are independent random variables when conditioned on the event {w e O : G{u)) = g}, with uniform 
conditional distributions 



(35) 
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These facts, (33) and the induction hypothesis finally imply 

P[CJ >P[P„nC„_i] 



-L 



P 



A 



i) = y,G = g 



Vm.-2{dy)^ 



-'m—l 



G = g VG^_i,„M9) 



> 



2+m(m-l) 



□ 

The next lemma shows that the angle space defined on a 1-codimensional hyperplane does not change 
too much under an orthogonal projection into a nearby hyperplane. 

Lemma 8 Let pi,P2 S S™~^, let us denote the angle space defined on pj- = {x G M™ : pi ■ x = 0} by 
OS, {i = 1,2), let TT ± be the orthogonal projection 0/11"* into P2 along p2 and let w = n ±\ ± be its 



restriction to Pi . Let Gnally K be a convex cone in p^ . Then 

as2{n2K) > asi{K)\pi ■ p2\. 

Proof. If pi • p2 = then the bound is trivial. Therefore, w.l.o.g. pi • P2 7^ and then tt is a vector 

space isomorphism between pj^ and P2 ■ Let {ei, . . . , e„i_2} be an orthonormal basis of pj^ n p^, let 
, m — 2) and let g^^_i be chosen so that {e\^K .... is an orthonormal basis of 



(1) .p(2) I 
m-1 '^m-ll 



\pi • P2 1 • Let us express vectors in pj^ in terms of coordinates 



ef = {j = 1 

for {i = 1,2). Then \e. 

y e IR"*"^ defined by linear combinations J2™Si^ yj^^P ■ Likewise, let z be the coordinate system defined 
on P2 by {e^^\ . . . , e^m-i}- Then tt expressed in terms of y-z coordinates is the matrix 

(l \ /l \ 

^0 ±\pi-p2\)' 

Now let Z he & multivariate standard normal random vector on p^, that is, Z has the density function 

/ 1 2\ 

/z(.) = (27r)-^exp -^^1^ . 



TT = 



e^') -e^'^ 



Then 

where Y = ■jt~^Z has density 

fviy) = fz{z{y)) 



as2{-KK) = -p[Ze Tr{K)] = P[Y€K], 



det 



dzj 



I 1 I 

= (2^)""^^ exp X] + • P2I 



=1 

m — 1 



>(27r) 



where the last inequality holds because -^21 < 1- Therefore, 



P[Y€K]> \p, ■ p2 
Together with (36) this proves the lemma. 



J (27r) ""2 ^ exp ^- 



(36) 



\Pi -P^l 



dy = \pi ■p2\asi{K). 
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The combination of Lemmas 3, 7 and 8 now allows us to derive lower bounds on the tail probabilities 
oi'rf{A). 

Theorem 2 Let A be a uniform random n x m matrix where m > 2 and n > m. Then there exists a 
constant c{m) > that depends only on m such that for all t >l/ cos(7r/4) it is true that 

/ ^m-i (arccosi) \ 1 



in) 



Proof. It follows from (35), the definition of G in Lemma 7 and the claim of the same result that 

P [Cm,k-i] = [ P [Cm,k-i\\G = g] iyGrn.,,Ad9) 

2 + (m-l)(m-2) f 

JGm-l,m 

+ ^^^^ 

Since {Ai, . . . , A^n} and {£"1, . . . , E„i-i, A^n} are linearly independent sets for all ui G Cm, it follows from 
Proposition 3 (i) that the vectors {Ai, . . . ,Am} define a unique SCP ca.-p(^PA{co) , Ra{ui)) , and likewise 
there exists a unique SCP cap(PE(w), Re{uj)) corresponding to the set of vectors {Ei, . . . , Em-i,Am}- 
Moreover, for all w G Cm,k-i we have ca.-p(PE, Re) C ca.p[PA, Ra) , and hence, 

< -Re(w) < i?A(w) < I Va;eC„,fc_i. (38) 

Inequalities (37) and (38) imply that for all i > 1, 

P [Ra > arccosl/f] > P [Ra > arccos l/t||Cm,fe_i] P [Cm,fe-i] 

>P[Re> arccos l/t\\Cm,k-i] ■ 2-'"'*"~2"""" . (39) 

Let {ei, . . . , Cm} be the canonical basis of M™. Then {ei, . . . , Cm-i, Am} is linearly independent almost 
surely, and it follows from Proposition 3 (i) that there exists a unique SCP cap(P, R) that corresponds 
to these vectors. Since Am is independent of the event Cm,k-i, which is defined entirely in terms of 
Ai, . . . , Am-i, and since the invariance of ^{Ai) under the action of the orthogonal group Om on S™~^ 
implies that [Ei, . . . , Em-i] is uniformly distributed on the Stiefel manifold Vm-i of m x (m— 1) matrices 
with orthonormal columns, we have 

P [Re > arccos l/t\\Cm,k-i] = P [-R > arccos 1/t] . (40) 
We will consider the unit vectors 

-j^ m — 1 

e = 7 Bj and = cosi? • + sini? • e for 1? e [— 7r/2, 7r/2]. 

vm — 1 ^ 



i=l 



Then a random angle : O — > [— 7r/2,7r/2) is defined almost everywhere by the condition ^^(a;) G 
^'e(w) + ^1- ^^^^^ 



e 



r arctan if E™T' Am-Si^l, 



if Ell/ Am-ei = l and • ^ 0, 



undefined otherwise. 



Note that pj^ + ei = pj^ + Ci (i = 2, . . . , m — 1) for all G [—■ 7r/2, 7r/2), that is, the definition of 6 is 
symmetric with respect to the Cj. 
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It is easy to sec that 9 has a continuous density function /e > such that /e(— ?9) = /e('!9) for all 
•& S (— 7r/2, 7r/2), and there exists a constant ce > such that /e(t^) > cq for all -d in the compact set 
[-7r/4,7r/4]. 

Note that one can parameterise the sphere S"*"^ by S™~^ x [— 7r/2,7r/2) via 



( AjCi, 1? I 1-^ ( MjCj i • cosi? + , Ptf, 



for EIl"l' AiCi e S'"-2 c Span (ei 



/ cos)9+(m-2) 
m — 1 

cos(i?)-l 
m— 1 



, &m~i)i that is, X]™^]^ = 1, and where 

cos(i?)-l 



cos(<?)-l 
m— 1 



sin 1 



11? \ 
Vtti— 1 



V 



cos(^?)-l 

m— 1 
sin i9 

\/m— 1 



cost9+(m-2) 
m— 1 



cos(i?)-l 

m— 1 
sin "d 

\/m~- 1 



cos^?+(m— 2) sint^ 
m— 1 V^^ — 1 

cos?? 



Am-l 

V / 



/m— 1 



Note also that the matrix appearing in the display is orthogonal with last column corresponding to p^. 
Thus, the chosen parameterisation corresponds to tilting the unit sphere S™~^ C Span(ei, . . . , em-i) by 
an angle i? about the affinc hull aff (ei, . . . , em-\) and shrinking it by cosi? to fit the radius of the sphere 
cut out of S™"-^ by the tilted plane. 

This parameterisation defines a conditional distribution ^{Am\\Q = ??) on S™~^ with continuous 
Radon-Nikodym derivative /a^hg with respect to i'm-2- Moreover, fA^wei^W''^) = if and only if 
X e aff(ei, . . . , Cto-i)- Therefore, there exists a constant ca> such that 

fA^WeixM) > CA (41) 

for all {x, 1?) in the compact set {a; e S™"^ : a; • e < 0} x [-7r/4, 7r/4]. 

Now Lemma 3 shows that for all t > l/cos(7r/4), 

P[R> arccos(l/t)] = P [(6 e [arccos 1/t - Tr/2, 7r/2 - arccos l/t\) 

A {-np±Am e cone(7rpiei, . . . ,TTp±em-i)))] 



/"Y— arccos j r 

= / fA^^\eix\\^)fei^)um-2idx)d^ 

J arccos j — ^ ^ tt^j. (— cone(ei,...,eTn-i)) 



(41),Lem 8 
> 



arccos -r — 



> CaCq 



CACe ■ 2-('»-2) 



CA-2-(™-i)|cos^'| •/e(t?)(ii? 

arccos j \ 



Therefore, (39) and (40) imply that 

P [Ra > arccos 1/i] > 2" 



(^+ '"'-^^i"'-^0 .c^ce-2-(-^).l 



(42) 



Finally, if A^+i, ■ . ■ ,An € cap(P^, i?^) then cap(PA, Ra) is the SCP of {Ai, . . . , An} and it follows 
from the remarks of Section 4 that ^(A) = | cosi?^|~^. Therefore, 

P ["^{A) > t] > P [(A„+i, . . . , A„ e cap(P^, A [Ra > arccos 1/t)] 

1 \ \ """^ 



> 



-2 (arccos ■ 



• 2 



(„_1)(„_2) 



). 
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Since ca and ce depend only on m, this proves the claim of the theorem. 



□ 



8 Exact Tail Decay Rates 

The decay rates of P['^(A) > t] developed in Sections 6 and 7 give an estimate on the rarity of a large 
backward error, high instability or long running times for some algorithms applied to a random linear 
feasibility problem drawn from Gaussian data. Moreover, as mentioned in the introduction, the best 
upper bounds on the running time of modern linear programming or linear feasibility solvers applied to 
real input data are polynomial in the problem dimension and log^(A). We are therefore also interested 
in estimates on probability tails 

P[log^(A)>f] (43) 

for t > 1. 

Theorem 1 implies that 

P[log^(A) > t] = PMA) > e*] < (;;)2mi (l - " e"*- (44) 

On the other hand, Theorem 2 shows 

P[log^(A) > t] > dm) ( ^>n-^(a^ccos(e-*)) y- _ 

Since JTO-2(arccose~*)//m-2(7r) increases monotonically to 1/2 for t ^ oo, these formulas show that the 
exponential decay rate of (43) is exactly —1. 

Corollary 1 If A is a random uniform n x m matrix then 

logP[log'^(A)>t] ^ 

t— >cio t 

Proof. The proof is immediate from the arguments above. □ 

Thus, although the multiplicative constant in (44) is too large, the formula captures the correct 
qualitative behaviour of the tails of log ^(A) and the best possible upper bound on (43) must be of the 
form 

P[log'^^"((y4) > t] < c(m, n) ■ e"* (45) 
for some constant c(m, n) that depends on m and n. 

The exponential decay of P[log'^((A) > t] shows that the linear feasibility problem, and by extension 
linear programming, is "empirically strongly polynomial". See Section 2 for further comments on this 
important point. 



9 Moment Estimates 

The probabilistic analysis of linear programming is primarily concerned with the average running time 
of LP algorithms on random input data. Because complexity bounds for interior-point methods are 
polynomial in log'^(j4) (see the introduction), upper bounds on the expectation, the variance and higher 
moments of the running time are easily derived from upper bounds on the corresponding moments of 

iog<r(A). 
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Since E[X] = P[X > x]dx for any random variable X that takes only nonnegative values, the 
estimate (44) can be used to derive upper bounds on all moments of log'^(A). Indeed, (45) shows that 
for all 7 > 0, 

/ P [{log'^ {A)y >t]dt < c{n,m)e-*"' dt = c{n,m)r{j+l) < oo, 
Jo Jo 

that is, all moments of "^(vl) are finite. To turn this into a quantitative estimate, we consider the function 
(fi : H-i- — > (5,1] defined as follows: 

, /m-2(arccos(e-*)) 

-'m-2(7r) 

Note that tp is continuous, and strictly decreasing with ip{0) = 1 and lim (f{t) = Let us define 

/(m,n) = <p-i ((l/2)V^A^). 
Since /(m, n) > 0, the following result follows. 

Corollary 2 Let A be a uniform random n x m matrix with n > m > 3. Then for all 7 £ 1R+ the j-th 
moment of log '^( A) is bounded by 

E[(log^(A))^] < f{m,nr +(^^2mh-'^T{^ + l). 
Proof. Using (44), we find 

E [{\og'^{A)y] = r p[{iog'^{A)y > t] dt 

Jo 



r°° fn\ 5 f /„_2(arccos e-/^"*'") )\ i 

Jf(m,n)-' \mj \ Im-2W J 

<f{m,ny+(^)2mh-^ r c-*^ dt 



< f(m, + {^] 2mh~'^r(j + 1). 
mj 



In Section 10 we will see that the bounds of Corollary 2 are particularly useful for understanding the 
behaviour of ^(A) when n » m. Note however that these bounds grow exponentially in m. One of the 
major objectives of the probabilistic analysis of linear programming is to show that the expected running 
times of particular families of algorithms are bounded by a polynomial of the dimension of the input data. 
Such results are often interpreted in the light of "average strongly polynomiality" of linear programming. 

Does the exponential growth of the estimates from Corollary 2 mean that Theorem 1 fails to lead to 
"average strong polynomiality" results when used to bound the complexity of interior-point algorithms 
for example? Not in the least! The exponential growth of the estimates from Corollary 2 is purely a 
consequence of our definition of the cut-ofl[ point /(m, n), which we chose so as to converge to zero as n 
tends to infinity to fit the purposes of the limit theorems of Section 10. Giving up on this condition one 
can easily derive linear bounds: 
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Lemma 9 Let {X„i.n '■ {m,n) G N x N} be a set of random variables and 7 > 1 a real number. 
Furthermore, let p(m, n) and t{m, n) be functions of m and n such that 



Then 



Proof. 



E [Xm,n] < max(p(m, ny, t{m, n)) + T{-f + l)2'>-\ 



POO 

;[x] = / P[x>t] 

Jo 

/ Idt + / exp < p{m, n) — f^ \ 

Jo Jni8cc(p(Tn,n)~' ,t(m,n)) ^ ' 



max(p(m,n)'^,t(m,n)) r-oo 

< I Idt + I exp -i p(m, n) — f T- !■ dt 

max(p(m,n)'>,t(m,7i)) 
f°° ( 1 1 "I 

< max(p(m, n)''', t(TO, n)) + / e-xp l—t~2^~~ \ dt (46) 
(p(m, n)\ t{m, n)) + T{-f + l)2^-\ 



= max 

where (46) holds true because we claim that 

p{m,n) — ty2^~y < — {t — max(p(m, n)''', f(m, n))) ^ 

for all t > max.(j){m,n)'^,t{m,n)). In fact, x t-^ x~ is a concave function, since 7 > 1. Therefore 

111 i 
- (p(m, n)T) T +- (i-max(p(m,n)^,i(m,n)))-^ 

< 2~~ {pirn, ny + t — max(p(m, n)'^ ,t(m, n))) ^ 

1 1 

< 2~^t^, 

which shows that ^ 

p{m,n) + {t — max(p(m, n)''', f(m, n))) < 2^~^t'^ 

and proves our claim. 



Corollary 3 Let A be a uniform random n x m matrix where n > m > 3, and Jet 7 > 1 be a reai 
number. Then 



E[(log'^(A))^] < (|mlogn+^logm + log2y + r(7 + l)2T-i. 



In particular. 





E[log'^(A)] < TOlogn+ -logm + log2 + 1, 
VAR (log '^{A}) < log n + ^ log m + log 2^ + 4. 

Proof. Equation (44) shows that for all t > t{m,n) = 1, 
P [(log'r(^))^ > = P [log<r(^) > 



< 



( ^\2mi (1 - -^"^-2 (arccos(exp{-tT })) 
\mj \ /m-2(7r) 



n—m 

X 



^ ^mlogn+f logm+log2— t "T 
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The first claim now follows from Lemma 9. Finally, since 

VAR(log^(^)) = E[(log^(A))2] - E[log^(A)]2 < ^[{\og^{A))% 
the last two claims are special cases of the first claim. 



□ 



Corollary 3 recovers the main result in [11]. However, it still does not fully exhaust the potential power 
of Theorem 1 and Lemma 9: indeed, in Section 11 we will further strengthen Corollary 3 and show that 
E[log^(A)] is asymptotically bounded by mlog 2 for arbitrary n> m, and by 0{m'^) for any 7 > when 
n > 5m. 

Remark 2 Let us briefly remark here that the main reason for the appearance of the binomial term (J^) 

in the bound of Theorem 1 is a lack of proper understanding of ^^{Ai\\P = p), where P is the centre of 
an LCP of A = [Ai, . . . , We suspect that knowledge of this conditional distribution would make 

it possible to replace (^) by a polynomial term in m and n. If this hunch were true then the bound of 
Corollary 3 would of course become logarithmic in m and n. 

Let us finally investigate the moments of 'j^{A) itself, which is interesting in its own right for reasons 
mentioned in the introduction. 



Corollary 4 Let A be a uniform random n x m matrix where n > m > 3. Then 

^'^{Ay] 



= +00 if 7 > 1, 

<l+(;)2mi^ if7G(0,l). 



Proof. Theorem 2 shows that for all t > (cos(7r/4))~T, 



1 , \ n—m 



Therefore, 



POO 

E ["^{Ay] = / p ["^{Ay > t] dt 

Jo 

poo 

> _ c{m)2"'-"t~^dt = 00 V7>1. 

On the other hand, using Theorem 1, we find that for 7 S (0, 1), 



/oo 
p > t] 

I 



< 1 + ( ]2mi I t~ydt 



m 



, n \ „ 5 7 
1+ 2m2- ' 



my 1 — 7 



□ 



Note that the moment bound of Corollary 4 grows exponentially in n for 7 < 1. This bound does not 
reflect the correct limiting behaviour, since we will show in Corollary 7 below that lim„^(x> E['^(^)'''] = 1 
occurs for 7 < 1 and m fixed. 
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10 Limit Theorems for m 



In this section we investigate the behaviour of '^<f{A) and log'rf{A) in the situation where n ^ m. For 
example, we will show that for fixed m, 

'^{A) "-^ 1 (47) 

with probability 1, and 

E[log^(A)]"-^0. (48) 

Intuitively it is clear that this behaviour should be observed: whenever the system Ax < 0, a; ^ contains 
a very large numbers of random constraints, the system should be infeasible and this infeasibility should 
be easy to detect algorithmically. Our results confirm this intuition. 



In the results below m > 3 is a fixed dimension, (Ai)]^ denotes a sequence of i.i.d. random vectors with 
uniform distribution on the sphere, S>{Ai) ~ ^(S™"^), and (j4["1)]n is the sequence of random matrices 
AM = [^i,...,A„]T. 

Theorem 3 Let {Ai)-^ and (A["1)]n he as defined above. Then 



lim "^(AW) = 1 



= 1. 



Proof. For all w G and n e IN let cap(P„(a;), i?„(a;)) be a LCP of A^"h By virtue of Proposition 2 
it suffices to prove that 

1. (49) 



lim Rn = 

.n—*oo 



Let p e (0, tt/2) be a fixed radius. Since S™ ^ is compact, there exists a finite set of vectors {pi, . . . ,pk} C 
S™-! such that U,^=i cap(ft, p/2) = S^-^. By 

CSi^n = {uj € Q. : Ai, . . . ,An ^ cap(pi,p/2)} 

let us denote the event that the i-th cap does not contain any of the n first vectors of {Ai)j^. Then 

'/m-2(7r - p/2)' 



P [C£i,n] 



and hence, 



We now claim that 



\JC£i 



U=l 



< 



(50) 



(51) 



In fact, if w € (Ui=i i,n)°, the complement of [Ji^i CSi^n, then there exist indices i G {1, . . . ,k} and 
j G {1, . . . ,n} such that Pn(oj) G ca.-p{pi, p/2) and Aj{uj) G cap(7;,, /3/2). Using the triangular inequality 
on the sphere we find i?„ < arccos {Pn{oj) ■ Aj{uj)) < 2 ■ p/2. This shows 



which is equivalent to our claim. 
Now (50) and (51) show 



P[Rn>p]<k' 
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Finally, since m < n2 implies Rm > Rn2J we have 



< P 



lim Rn > p 



< lim P [Rn > p] = 0. 



Since this is true for all p G (0,7r/2), (49) follows. 
Corollary 5 Let (A["1)]n be as above. Then 

ii) '^(AH)"=^i, 

Hi) P \ lim log'r(^["l) = 0=1, 

iv) log<^(A["l) ^°^0, 
y; log'^(AW)"=^0, 



□ 



where — > p denotes convergence in probability and => denotes weak convergence. 

Proof. These are all standard consequences of Theorem 3, see for example Theorem 25.2 in [4]. □ 



Using (44) and Corollary 5 one can analyse the asymptotic behaviour of E['^(A)] and E[log'^(yl)], see 
Corollary 7 below. In the case of log'^^(A) for example, one can show that lim„^ooE[log'r(Al"l)] = 0, 
using Skorhohod's theorem. Remarkably, the estimates of Corollary 2 are strong enough to yield this 
result directly, without resort to Theorem 3. 



Corollary 6 Let m be fixed and (j4j)]N and (A["1)]n defined as above. Then 



lim E [(log^(A["l)) ' 



V7 > 0. 



In particular, 



i) lim Eriog^(A["l)l = and 

n^oo 

ii) lim VAR flog -^(Al"!)) =0. 

n— ^00 V / 



Proof. Since 2 '"^Z ^ is exponentially decreasing in n and 




increases only polynomially in n, the second term in the estimate of Corollary 2 tends to zero as n tends 
to infinity. In addition, /(m,n) tends to zero as n tends to infinity by definition of f{m,n). This proves 
the displayed formula and as a particular case part i). Part ii) follows from the display and the fact that 
VAR(log<^(A)) < E[(log<r(yl))2]. □ 



Let us now analyse the asymptotic behaviour of E [^(A)]. Recall from Corollary 4 that E['^(A)'''] is 
finite if and only if 7 < 1. 
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Corollary 7 Let m he fixed and {Ai)-^^ and (^["1)]n defined as above. Then 

lim E[(^(yl["l))7] =1^ V7 e [0, 1). 



Proof. The result is trivial for 7 = 0. Let us therefore assume that 7 e (0, 1). Let to G R+ be large 

enough so that 

Im-2 (arccosf"^ ) o 

1 ^ — f-. — ^<T, yt>to. 



Since "^(AI"]) — >p 1 by Corollary 5, for all e > there exists a number rie € N such that 



> (1 + e)^ 



Therefore, for n> n^, 



E 



< / / "^^(ylW) > (1 + e)- 

Thml,(52) , . , . 

< (l + e)^ +e[to-{l + €)yj + 
= (1 + e)^ + e (to - (1 + e)^) + 2n'"mi 



< e, Vn > Tie- 



dt 



•00 

dt+ I P 

to 



dt 



^j2m5 Q 1 / t~^dt 



to 



O \ tn 



^ - 1 

7 



(52) 



Taking limits as n ^ 00 and observing that e > was arbitrary, the claim follows. 



11 Limit Theorems for m ^ 1 

In this section we investigate the behaviour of log'^(A) in the situation where 1. We will see that 

limsup^^^ E[log'^(^)]/m < log 2, and we point out why we suspect that the correct value of this limit 
is zero. 

Let (Pto)in C (0,7r/2] be a sequence such that lim pm = 7r/2, and let (A["*1)]n be a sequence of 

m—*oo 

random vectors such that A^"^^ ~ (S™"^). It can be shown that if Pm converges to 7r/2 at an algebraic 
rate as a function of m, then 



hm P 



0. 



This effect is a special case of the so-called concentration of measure phenomenon, see e.g. [26]. The 
phenomenon is remarkable, because it implies that after fixing an equator by choosing an arbitrary 
grand circle on a high dimensional sphere, one will observe that a counter-intuitively high proportion of 
uniformly drawn sample points from that sphere lie in a very narrow neighbourhood around that equator. 
This phenomenon affects the analysis of the distribution tails of '^{A) for large m. 

However, for the purposes of this analysis it suffices to know that for any fixed real exponent 7 > 0, 

Im-2 (arccose"'""') 1 

7 — r~\ — = o- 

An elementary proof of this fact can be found in Lemma 10 of Appendix A. 
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Corollary 8 For all (m, n) such that m <n let Al"*'"! be a uniform random nxm, matrix. Then 

( E[log^(AK^\ 
limsup I sup — I < log 2. 

m— ^oo \n>m ^ / 

Proof. Let e > be a small real number and X a binomially distributed random variable X ~ 
Bin(m + fc, (1 — e)/2). If <& denotes the cumulative distribution function of the standard normal distri- 
bution, then the central limit theorem shows that 



m + k\ / 1 + e 
m 



1 -e 



P [X = m] 



< 



1-e 
2 

1-e 
2 

1-e 
1 



(m+fc)(l^e) 



2 



fc — 1 + e(m + fc) m— fc+l + e(m + fc) 



V'(m + fc)(l-e2) ' ^(m + fc)(l-e2) 



TO — fc + 1 + e(TO + fc) 
2 



TO — fc — 1 + e(TO + fc) 

^(TO + fc)(1^7^ 



(54) 



V'(m + fc)(l-e2) 



exp < mm 

27r I 2 



(m-fc + l + e(TO + fc))^ (TO-fc-l + e(TO + fc))^ 



(m + fc)(l-e2 



(m + fc)(l-e2) 



The approximate equality (54) becomes asymptotically exact. Therefore, there exists a number m\ € IN 
such that for all to > mi we have 



TO + fc\/l + ey / 2 12 
— — <exp<^TOlog- h-log- 



1 



1 



2 log(TO + fc) — - min 



(to — fc + 1 + e(TO + fc))2 (to — fc — 1 + e(TO + fc))^ 



(55) 



(TO + fc)(l-e2) ' (m + fc)(l-e2) 
Moreover, (53) shows that there exists a number to^ e IN such that for all m > we have 

Im-2 (arccosc^^^ 



< 



1 + e 



/m-2(7r) - 2 

Equations (44), (55) and (56) show that for all to > max(TO^, to^), n = to + fc > to and t > \/rn^ 

[log'^(A[™'" 



(56) 



> t 
2 



^3 5 
< exp \ m log Y— ^ 2 ^ ~'~ 2 ^ 



Lemma 9 implies that for the same parameters. 



E 



log<r(A[™'"l) 



< max TO log 



1-e 



3 5 

2 log2 + 2 logm, ) + 1. 



Since e was arbitrary, the claim follows. 



□ 



The asymptotic linearity in to of the boimd on E[log'^(A)] derived in the above corollary is due to 
the appearance of a binomial term in (44). This is largely an artifact of our specific analysis, and if the 
hunch of Remark 2 is true, then the asymptotic behaviour for n > to 2> 1 is given by 

E[\og'rf{A)] = Oim-^) (57) 
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for any arbitrarily small real exponent 7 > 0. However, it will not be logarithmically small in m, 
because of the concentration of measure phenomenon. Thus, if the hunch is true, then (57) describes 
the asymptotic behaviour for arbitrary n > m and m ^ 1. On the other hand, when n > 5m, we can 
actually prove that (57) holds true. 

Corollary 9 Let {^I™'"!} be the set of random matrices defined in Corollary 8. Then for any real 
exponent 7 > 0, 

/ E[log<r(^M)]\ 
limsup sup — I = 0, 

m — *oo \ n>5m 

mT / 

that is, E[log<^(A[">"])] grows more slowly than any algebraic function of m when n > 5m. 

Proof. We need to consider (55) again. For k > Am, and e small enough we have 

' (m - + 1 + e(m + k)f (m - fc - 1 + e(m + k)f 



m log < - min 



1-e 2 V (m + fc)(l-e2) ' (m + fc)(l-e2) 
and then 

m + k\ fl + e\^ fl, 2 



< exp <^ - log - \ (58) 



m J \ 2 J - [2 °7r 
for all m>m\. Moreover, (53) shows that there exists a number such that for all m> 



/to_2 farccose 1 , , 

— — - ^ 4^- 

-'m-2(7r) 2 

Equations (44), (56), (58) and (59) together imply that 

P [log'^(^['"'"l) > < exp 1^ log2 - i logTT +^\ogm-t 

for all n > 5m, m > mi = max{ml,m^) and t > mi . Finally, applying Lemma 9, we get 



E [log'^(A['"'"l) 



3 15 
< 1 + max ( - log 2 — - log + 2 "^i = ] . 



Dividing by m'^ and taking limits, the result follows. 



12 A Final Remark About the Case n < m 

The development in the previous sections assumes n> m. The case n < m has been dealt with in [11], 
where it is proved that 

E[log'^(A)] < ^logn + 2. 
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Appendix A: A Concentration of Measure Inequality 



Let X ~ '^(S"-i) and p e S™"!. Note that for p < n/2, 

Im-2{p) 



P[Xecap(p,/))] 



Im-2{'n-) 



(60) 



where the last Une holds if 



2 J^^ sin™-2 TdT + J; sin™-2 rdr 

_ J^' Sin-' rdr 
f^siiT-^rdr 

Now, for p <C 7r/2 we have ^ = 0(p'"~^), and hence, P[X G cap(p, p)] = 0{p"^~^) as one would expect. 
Likewise, one expects intuitively that if 

« 1 (61) 

2 

then 

P [X e cap(p, p)] = i - O (^^) , (62) 
and this is indeed the case. However, it is somewhat surprising that for large m, the expression 

^ (63) 

2 

has to be extremely small indeed before the order (62) is observed. In fact, if (63) decreases to zero 
at an algebraic rate as a function of the dimension m, then P [X e cap(p, p)] converges to zero. We 
are not going to prove this property here, although an elementary proof can be given along the lines of 
Lemma 10 below, but we remark that this is a special case of a type of properties of high-dimcnsional 
probability distributions that are jointly referred to as the concentration of measure phenomenon. See 
e.g. [26] for a good account of this theory. The purpose of this appendix is in some sense to get around 
the adverse effects of the concentration of measure phenomenon and to show that if the expression (63) 
is exponentially small in terms of m, then (62) is asymptotically observed. In fact, we are going to prove 
a slightly weaker result which is sufficient for the purposes of the analysis of Section 11. 

Lemma 10 Let 7 > be a constant, p G 8""^ and X ~ '^(S'""i). Then 



Urn P 

m— ^00 



X e cap ^p, arccose 



Proof. Let e (0, 1/2) and let e IN be such that 

1 



^ ^ me -3 

Then, for m > me equation (60) implies that 



P[X e cap(p, p)]<9-^ r sin™-2 rdr < 29 [ ^ sin™-^ rdr. 

Jo Jo 



(64) 
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It is easy to show by induction and partial integration that 



sin™ rdr = < 



(m-l)(m-3)...(m-2fc-l) g.^m-2fe-3 ^ 



f -2 (m-l)(m-3)...(m-2fc-l) .„m-2fe-3 



In particular, 



^{cOSp) sin™ P + Y.k=0 (m-2)(m-4)...(m-2fe-2) 

if m is odd, 

— (cosp) sin"*~ p + Vt_n 7 tT7 r; — ? oit-tt 

f^' ^ (fc=0 (m— 2)(m— 4)...(m— 2fe— 2) 

3) 3-1 .^ - p) if m is even. 

' m(TO— 2)...2 V2 f^' 

' (n^-l)(n^-3)...4.2 if ^ is odd, 



Sin" 



sin*" Tdr 



m(m-2)...3-l 
(m— l)(m— 3)...3-l tt t 

m(m-2).'.2 "I ^ ^ven. 



It follows from (65) and (66) that 



(m-l)(m-3)...2 ^ _ l-sm"'+^ p J f ^j^^m 



m(m— 2)...l cos p 

if m is odd, 



cos p 



sin'" rdr < < 



sin" rdr 



2 sin sin"^ p) 



1 _ 2p 



TT COS p 

if TO is even. 



< 



(1 - sin'"+V) + 2(1 - sin^ / p . 



cos p 



sin™ rdr (in both cases). 



where the last inequality holds at least for7r/4<p<7r/2. Therefore, 

sin™ TdT= r sin™ rdr - / ' sin™ rdr 
Jo Jo J p 

> \l - (l-^in-^V) + 2(l-sin-p)1 . / n ^^^A 

cosp J yjo J 

Note that the combination of (64) and (67) implies that for w/i < p < n/2, 

(1 - sin™+V) + 2(1 - sin2 p) 



20 <1- 



cosp 



P[X GcMp,p)]>0. 



Now let the sequence {pm)TN be defined by 

Pm = arccose"™^, 

and note that 

^ _ (1 - Sin™+^ Pm) + 2(1 - Sin^ Pm) ^ ^ _ ^m^^ (l-2{m+l)) 
COS pm 



(65) 



(66) 



(67) 



(68) 



Therefore, for to > m,g large enough, pm S (7r/4, 7r/2) and the condition on the left hand side of (68) is 
satisfied. This shows that 



lim P 

m — >oo 



X e cap ^p, arccose 



> 



and since this is true for any 9 G (0, 1/2), this proves that lim^^oo P [-'i^ € cap (p, arccose ™'^)] > |. 
Moreover, the inequality limTO->oo P € cap [p, arccose"™^)] < | is trivial, and the result follows. □ 



30 



13 Appendix B: Complexity of the Relaxation Method 



The purpose of this appendix is to make a convincing argument that the complexity of relaxation methods 
for the solution of the linear system Ax < 0, a; 7^ is proportional to ^{A)^. In a sense, this fact is in 
the general knowledge of researchers familiar with both the relaxation method and condition numbers, as 
conversations with Marina Epelman, Rob Freund and Dan Spielman confirmed. Moreover, this fact has 
been implicitly stated in the relaxation method literature for decades, and all it takes to make it explicit 
is to translate well-known results from the relaxation method literature into the language of condition 
numbers. For lack of an explicit reference, let us give such an example here. 

The algorithm we consider is the so-called perceptron algorithm [35]. When applied to solving a strictly 
feasible system Ax < 0, where A G M"^"* with unit row vectors, this algorithm starts from an initial 
point xq E R™ and constructs an iterative sequence of points (a:;i)]N C H™ as follows: if Axi < then 
Xi is a solution and the algorithm stops. Otherwise, a row vector a^'l from A is chosen so that a^^^Xi > 0, 
and the next point is computed by Xi+i = Xi — ai. 

The usual convergence analysis then proceeds as follows, see e.g. [5]: let w G be a solution of 
Ax < 0, let a = maxi<j<„ a^w, where aj is the j-th row vector of A, and let w* = w/laj. It is then easy 
to show that if the algorithm has not stopped before or during iteration i, that is if Xi+i ^ Xi, then 

II *ii2^ii ^ii2 -1 

llxj+i — w II < lla;, — II — 1. 

This shows that at most ||a;o — w*\\^ iterations can take place. For simplicity, we can choose xo to be the 
origin, so that the algorithm has complexity (!?(||t/;* |p). 

Now note that a = cos6'(^, w), where we use the notation of Section 3. Without loss of generality we 
may assume that w is a unit vector, so that 11^*11 = | cos6{A, w)\~^. In order to minimise the complexity 
estimate, we need to choose w = x, so that we find that the algorithm terminates after at most ^{A)'^ 
iterations. 
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